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ABSTRACT 

Livingston's work is a careful analysis of what 
occurs when one pools two populations with different means, but 
similar variances and reliabil.ity coefficients. However, his work 
fails to advance reliability theory for the special case of 
criterion-referenced testing. See ED 042 802 for Livingston's paper. 
(MS) 
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It may be of value to present some comments about a recent development 
in reliability theory [Livingston, 1971] even though Livingston’s material 
is, as of this writing, in unpublished form. The reason for presenting these 
comments now is that his work has become known to a nvunber of people and is 
being used for reliability estimation of so-called criterion referenced tests. 
For example, Hsu [l97l] in a paper presented to the 1971 AERA Convention, 
reports Livingston reliabilities for tests used at the University of Pittsburg. 
These comments are intended to clarify the meaning of a Livingston reliability 
coefficient . 

I shall not comment in detail on the Livingston paper. His algebra is 
impeccable, and the formulas he derives follow exactly from his definitions 
and from the assvunptions of classical reliability theory. Instead, I shall 
show that a Livingston reliability coefficient, given as 



K^(X,T^) = 



a^(X) + 



5 



is identical with a conventional reliability coefficient when that coefficient 
is based on two populations with means equally distant above and bel.ow C^. 

For this to be true it is necessary that a (T ) and a (E) be identical in the 
two populations, which thus implies that the conventional reliability coef- 
ficient for either population: 

a^(T^) + a^(E) 

is identical in the two popxalations . Also, the classical assumption of 
independence of true score and error must hold in both populations, may 
be taken as the mean of either population, and then C^, which Livingston 
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designates as a criterion score, must be the mean of the two populations 
pooled. 

Given two populations with a^(X^) = - Cf^(X), with “ 

a^(T ) = a^(T ), and with a^(ET) = a^(E„) = a^(E), but with means y, ^ Vip, 

Xg X Id 

we can write the variance of observed l cores for the pooled populations as 



2, , “i , Wa® (p, + n/ 
a (X) + — + _ - ^ 

2 2 U 



This follows by determining the expected value of the squared deviations of 

the scores in the two populations from which is the mean of the pooled 

(equally weighted) populations and is identical with ^1 ^2 , 

2 

(Pl - >‘2>^ 

Now + ; has an equivalent form equal to either 

2 2 U 

p 2 

(u. - y ) or (y^. - y ) , both of which have the same value when y lies 
1 p e. p P 

half way between y^ and y^. Consequently the variance of the two populations 
pooled is 

a^(x) + (y^ - y^)^ . 

Since classic true scores have the same mean as observed scores, by a similar 
argument the variance of true scores in the two populations pooled is 
o^(T ) + (y - y )^. Therefore the classic reliability coefficient based on 

' X 1 p 

these two populations pooled is 

g^(T^) (p^ - p^)^ 

a^(X) + (y^ - 

which is Livingston’s coefficient when we identify his y^ with our y^ and 




his C with our y . 
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Livingston offers this illustration or interpretation of his coefficient. 
Suppose an employer wishes to employ workers who score at least 75 on a given 
test. The criterion score, C , is set in advance as 75. If* the employer 
then tests a popvilation with mean score, u , of 60 he is likely to reject 

A 

many of these persons. Livingston would, I believe, argue then that his re- 
liability coefficient which incorporates the quantity (60 - 75) in both the 
numerator and the denominator of the formula is the correct indicator of the 
confidence we can have in believing that an individual in this population has 
a true score less than 75. 

Let us make two points. 

If we have available a population with a mean of 6o and another with a 
mean of 90, both populations identical in the variance of observed, true, and 
error scores, the conventional reliability coefficient based on the two popu- 
lations pooled will be 

a^(T ) + (60 - 75)^ 

X 

a^(X) + (60 - 75)^ 

which is the Livingston coefficient based on the population with mean of 60 
when C is set at 75. Apparently then the Livingston coefficient requires 

X 

that in addition to the popiilation with mean of 60 one must I'egard as reasonable 
the postulation of a similar population with mean 15 points above the of 75 » 
or a mean of 90. One can readily conceive situations in which this is not 
reasonable. Suppose the test has a ceiling of, say, 20 points and the cri- 
terion is set at l6 points . If we use this test with a population having a 

mean of 10, then there can be no population with a mean of 22, l6 plus (l6-10), 
such that the pooling of the two populations gives a conventional reliability 
coefficient equal to the Livingston coefficient. Both "floors" and "ceilings" 
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on tests are facts of life and they may he inconsistent with the procedure 
Livingston recommends. In general, a reliability coefficient depends upon the 
range of talent. Theoretically at least, one can manipulate the magnitude of 
the reliability coefficient by altering the range of talent. Livingston appears 
to have secured his "bigger" coefficients as a consequence of implicitly ex- 
tending the range of talent. 

The second point is that the variance of the errors of measurement may 
be a better guide to one's confidence in asserting that a given individual has 
a true score below (or above) a certain point than is the magnitude of the 
reliability coefficient. Again, at least theoretically, manipulating the 
magnitude of the reliability coefficient by altering the range of talent does 
not alter the variance of the errors of measurement. Textbooks commonly state 
that the standard error of measurement is independent of the range of talent. 
Now it is almost painfully obvious that 





"(X) "'“x 




equals j, which is or the variance of the errors of measurement. 

Thus althou^ Livingston's reliability coefficient is larger than the conven- 
tional one, the standard error of measurement is the same, and consequently 
this larger coefficient does not imply a more dependable determination of 
whether or not a true score falls below (or exceeds) a given criterion value. 

Viewed from this standpoint, Livingston's work appears to be primarily 
a careful spelling out of what occurs when one pools two populations with 
different means, but similar variances and (conventional) reliability coef- 
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ficients. If this view is correct, we must conclude that his work fails to 
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